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Abstract 

In downlink multiuser multiple-input multiple-output (MIMO) systems, block diagonalization (BD) 
is a practical linear precoding scheme which achieves the same degrees of freedom (DoF) as the optimal 
linear/nonlinear precoding schemes. However, its sum -rate performance is rather poor in the practical 
SNR regime due to the transmit power boost problem. In this paper, we propose an improved linear 
precoding scheme over BD with a so-called "effective-SNR-enhancement" technique. The transmit 
covariance matrices are obtained by firstly solving a power minimization problem subject to the minimum 
rate constraint achieved by BD, and then properly scaling the solution to satisfy the power constraints. 
It is proved that such approach equivalently enhances the system SNR, and hence compensates the 
transmit power boost problem associated with BD. The power minimization problem is in general non- 
convex. We therefore propose an efficient algorithm that solves the problem heuristically. Simulation 
results show significant sum rate gains over the optimal BD and the existing minimum mean square 
error (MMSE) based precoding schemes. 
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I. Introduction 

Traditional approaches for downlink inter-cell interference management, such as frequency 
reuse, coordinated scheduling or beamforming techniques [QQ|, mostly follow the notion of 
"interference avoidance". Recent work on multi-cell cooperative processing (MCP) [2], with the 
idea of exploiting the interfering links instead of simply avoiding them, shows that the spectral 
efficiency can be significantly enhanced by allowing joint transmission from the interfering base 
stations (BS). In principle, MCP transforms the multi-cell multi-user network into a giant multi- 
user system, where the resources can be more efficiently utilized. In the ideal case, downlink MCP 
enabled networks are equivalent to broadcast channels (BC), where dirty-paper coding (DPC) 
is capacity achieving [3 J. However, DPC is generally too complex for practical implementation 
for real-time systems due to its complicated nonlinear encoding and decoding processes. As a 
consequence, linear precoding schemes have drawn a lot of attentions since they can achieve a 
reasonable balance between complexity and performance flU, 0, 0, 0. One class of linear 
precoding schemes of particular interest is block diagonalization (BD), which can be viewed as an 
extension of zero-forcing channel inversion in the multiple-input single-output (MISO) broadcast 
channels, e.g., [[8]|, 0, to the more general multiuser MIMO networks. With BD, the inter-user 
interference is completely eliminated by restricting the precoding matrix for each mobile station 
(MS) to be orthogonal to the channels associated with all other MSs. The initial study on BD 
mostly focuses on single-cell systems, where the sum-power constraint is generally considered 
OH, ffTTTl . 021, lf!3l . The extension to multi-cell networks with per-BS power constraints is non- 
trivial [fl4ll . |fl"5l . In |fl"5l , the weighted sum rate maximization problem with BD was formulated 
as a convex optimization problem, from which a closed form expression for the optimal BD 
precoders was derived. The main advantages of BD lie on its simplicity and good performance 
at high SNR. However, it gives quite poor performance in the low to medium SNR regime due 
to the transmit power boost problem. 

One straightforward solution to improving the low-to-medium SNR performance of BD seems 
to be the MMSE-based precoding schemes. For the special case of single-antenna receivers, a 
regularized channel inversion scheme was proposed in jS), with the regularization parameter 
inversely proportional to SNR. Such techniques were extended to the multiuser MIMO systems 
with sum-power constraint lfT6l . ifTTll . For multi-cell cooperative networks with per-BS power 
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constraints, the authors in [fT8l proposed to decompose the precoding matrix into a preliminary 
matrix and a diagonal power control matrix, where the preliminary matrix was designed to have 
the MMSE structure in order to balance the noise and interference effects. Another MMSE-based 
precoding scheme under per-BS power constraints was proposed in |fT9l , where sum-MSE is 
minimized directly. However, due to the complicated mathematical structure, only a local optimal 
solution can be obtained and it requires iteratively solving a sequence of convex problems. As will 
be shown in Section [V] under per-BS power constraints, although the MMSE-based precoding 
schemes can provide certain performance gain over BD at low SNR, the achievable sum rates 
are lower than that achieved by BD as SNR increases. In other words, the existing MMSE-based 
precoding algorithms fail to achieve the same DoF as BD. 

In this work, we focus on the MCP-enabled downlink networks under per-BS power con- 
straints. The main objective is to propose an efficient scheme that improves the performance of 
BD in the low to medium SNR regime, while preserving its good performance at high SNR. 
Unlike BD, the proposed scheme takes the noise effect into consideration and interference leakage 
is allowed. The performance gain is mainly attributed to a so-called effective-SNR-enhancement 
technique, by solving a power minimization problem with a minimum rate constraint achieved by 
BD and properly scaling the obtained transmit covariance matrices to satisfy the power constraint. 
Such technique provides a method to compensate the transmit power boost problem associated 
with BD. The power minimization problem is non-convex in general due to the non-convex 
rate and rank constraints. To tackle this issue, we firstly convexify the rate constraints with 
Taylor approximation and then solve the rank-relaxed convexified problem in the dual domain. 
A closed form solution in terms of the dual variables is then obtained. With such an expression, 
it is found that the solution is also optimal to the rank-constrained non-convex problem since 
it automatically satisfies the rank constraints. The proposed scheme is efficient since eventually 
only one convex optimization problem needs to be solved. 

The rest of the paper is organized as follows. Section III] introduces the system model and 



problem formulation. Section III reviews the optimal BD under per-BS power constraints. Sec- 



tion [IV] presents the proposed scheme and in Section [V] numerical results are given. Finally, 



conclusions are given in Section VI 



Notations: Throughout this paper, scalars are denoted by italicized letters. Boldface lower- 
and upper-case letters denote vectors and matrices, respectively. I denotes the identity matrix 
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and denotes an all-zero matrix. For a square matrix S, Tr(S), |S|, S _1 and S 1 / 2 denote the 
trace, determinant, inverse and square-root of S, respectively. S >z and S y represent 
that S is positive semi-definite and positive definite, respectively. C MxN denotes the space of 
M x N complex matrices. ||x|| 2 is the Euclidean norm of a complex vector x. Diag(x) denotes 
a diagonal matrix with the main diagonal given by x. For an arbitrary matrix X, X T , 
and rank{X} represents the transpose, conjugate transpose and rank of X, respectively. vec(X) 
denotes a column vector by stacking all the columns of X. ~ means "distributed as". £/V(x, S) 
represents the circularly symmetric complex Gaussian random vector with mean x and covariance 
matrix S. 

II. System Model and Problem Formulation 

We consider a downlink multi-cell cooperative network with K t BSs, each equipped with N t 
antennas, as shown in Fig.[T] Denote the total number of transmitting antennas as M = K t N t . At 
each time slot, K r MSs are scheduled and served by all the cooperating BSs. Each MS has N r 
antennas and thus can receive up to N r data streams. Denote by dfc the information-bearing signal 
for the A;th mobile station (denoted as MSk), where dfc G C WrXl . Assume Gaussian codebook 
is used and d fc ~ £A/"(0, I), Vfc. Perfect channel state information (CSI) at the BSs is assumed 
and the precoding matrices for all the MSs are jointly determined. The total number of transmit 
antennas is assumed to be no less than the number of receiving antennas of the scheduled users, 
i.e., M > K r N r . In the sequel, we assume that M = K r .N r for simplicity. The received signal 
at MS k is then given by 

K r 

Yfc = HfcW fc dfc + ^ H fc w i d i + n k, k = l,...,K r (1) 

where Hfc = [Hfci Hfc 2 • • • HfcxJ G C WrXM denotes the channel matrix for MSk, which is 
assumed to be of full row rank. Hkj E C NrXNt is the channel from the jth base station (denoted as 
BSj) to MSk- Wfc G C MxNr is the precoding matrix for MSk, with each column corresponding 
to one data stream. It is possible that the number of data streams for MSk is less than iV r , in 
which case, the corresponding columns of Wfc are set to zero vectors, rifc G C 7 ^ 1 denotes the 
receiver noise. Without loss of generality, we assume that n fc ~ CN(0,T), Under single-user 
decoding with multi-user interference treated as noise assumption, the achievable rate Rk for 
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MS k is given as (201 



Rk = log- 



K r 

E 

i=l 



I + £ HfcWiWf Hf 



AV 

E 



1+ £ H fe W,WfHf 



1, . . . , K r 



(2) 



Denote the transmit covariance matrix for MSk as = E [W^dfod^W^] = W^W^. Then 



(3) 



S fc e C MxM , S fc b and rank{S fe } < N r . For 5^, define a binary matrix as 03 

B^Diag(0,--- ,0,1,-.. ,1,0,-.. ,0) 
s ^ / v ^ / s ^ / 

(j-l)JVt iVt (Kt-j)Nt 

Without loss of generality, assume that all BSs have the same power constraints P. Then 
finding the optimal linear precoder for sum rate maximization under per-BS power constraints 
is equivalent to solving the following optimization problem 

K r 



(PI): maximize R k 

{R k },{S k } 



subject to R k < log- 



K r 

I + E HfcSjH^ 

i=l 

1+ £ H fc S,H^ 



Wk 



K r 



^Tr(B,S fe )<P, Vj 



fc=i 



Sfc y 0, rank{Sfe} < N r , Vfc 



(4) 



(5) 



(6) 



(7) 



where (|6]) represents the per-BS power constraints. Note that (PI) optimizes over the transmit 
covariance matrices {S&} instead of the precoding matrices. The explicit rank constraint is 
necessary since otherwise, the ranks of the resulted transmit covariance matrices may exceed 
N r , which is impractical due to the limited number of antennas at the receivers. (PI) is non- 
convex due to the non-convex rate and rank constraints. Therefore, it is difficult to find a global 
optimal solution efficiently. 



III. BD WITH PER-BS POWER CONSTRAINTS 

Under zero inter-user interference constraint, it has been shown that (PI) can be formulated 
into a convex optimization problem, from which the optimal BD solution can be efficiently 
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obtained. This section reviews BD under per-BS power constraints, which is mainly based on 
lfT5ll . BD completely eliminates the inter-user interference by ensuring that H;W fc = 0, \/i ^ k, 
or equivalently 

H,S fc Hf = 0, Vi 7^ k (8) 

Define G k = [Hf . . . H£ +1 . . . H£J T e C^-^- 1 )^. Perform singular value decom- 

position (SVD) to Gfc to obtain 



G fc = U,[£, 01 



(9) 



where U fc e c^-^- 1 ^^-^ V fe e C JlfxJV -^'- 1 ), S fc is a JV r (if r - 1) x JV r (A: r - 1) positive 
diagonal matrix and 6 C MxNr spans the null space of G&. Then ([8]) is satisfied by letting 
Sfc = VfcQfeVf , where Q fc G C NrXNr and Q fc ^ is the new design variable. With such a 
structure for S^, rank{Sfc} < N r is automatically guaranteed. Then finding the optimal BD to 
maximize the sum rate is equivalent to solving the following problem [fT5ll 

K r 



(P2): maximize ^ log I + H fc V fc Q fc Vf Hf 



(10) 



subject to J2 Tr (BjV fc Q fc Vf) < P, Vj (11) 

fe=i 

Q fe >: 0, VJfe. (12) 



(P2) is convex, and hence can be solved efficiently with standard interior point method 11211 or 
existing software tools such as CVX ||2"2|. In |fl5ll . a closed form solution is derived. 

IV. Improved Precoding over BD 

BD performs very well in the high SNR regime and achieves the same DoF as the optimal 
linear/nonlinear precoding schemes [fT5ll . However, in the low to medium SNR regime, the per- 
formance is poor. We therefore propose an extra step of optimization to improve the performance 
of BD in the low to medium SNR regime, yet preserve the good performance at high SNR. 
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Let {-Rf } be the rate tuple achieved by BD. Consider the following optimization problem 



(P3): 



maximize — p 

P,{S fc } 



subject to log- 



K r 



I + E HfcSjH^ 



i=i 



E 



11+ E H fc S,Hf 



A'r 



^ Tr (Bj-Sfc) <pP, Vj 



fe=i 



S fc y 0, rank{S fe } < iV r , Vfc 



(13) 



(14) 



(15) 



(16) 



(P3) minimizes a common power factor p for all BSs, while ensuring a minimum rate tuple 
achieved by BD. Unlike BD which completely eliminates inter-user interference, interference 
leakage is allowed in (P3). For the special case of N r = 1, (P3) can be transformed to the power 
minimization problem in [|23l . where an equivalent second order cone programming (SOCP) 
form is known. However, for the general case when iV r > 2, no convex formulation of (P3) is 
known. Before solving the problem, we will discuss how the solution to (P3) will help to find 
an improved precoder design over BD. 

Theorem 1. (P3) is guaranteed to be feasible and the solution {p opt , {S^ p< }} satisfies p opt < 1. 

Proof: It is easy to see that {p = 1, {Sf D }} is feasible for (P3), where {Sf D } is the optimal 
BD transmit covariance matrices set. As a result, Theorem [T] follows. ■ 
Although {S° k pt } does not strictly increase the rates over {R k D }[j] the minimized power factor 
p opt makes it possible to effectively suppress the noise and hence enhance the effective SNR. 
This can be achieved by using the new transmit covariance matrices S^ ew = S^ p * / p opt ,Vk. Since 
{p OJrf , {S£ pt }} is feasible to (P3), it is easy to see that {S£ ew } satisfies the rank and power 

K r 

constraints in (PI), i.e., rankjS^} < N r , Wk and £ Tr (B^Sf w ) < P, Vj. Furthermore, the 

k=l 



In fact, with {S^ p }, the rate achieved by user k equals to R k . This can be seen as follows. Suppose on the contrary, 
with the optimal solution {p opt , {S^ pt }}, there exists a user k such that Rk > Rk° ■ Then we can strictly decrease the transmit 
power to user k so that the minimum rate constraint is still satisfied. As a consequence, the power to other users can also be 
strictly decreased since the interference from user k is reduced. This implies that the power factor p can be further reduced, 
which contradicts that p° pt is the optimal solution. 
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new achievable rate for MSu satisfies 



ryn 
JrCu 



log 



log 



> log 



T i 1 ST^Kr I I QOpt-rjH 



T j i_ \^K r ■ ■ aopt-rrH 

1 i* pop* Z^i=l,i^fc "fe^i n fe 



opt-wjH 



I + E£ M ^H fc SfH 



(17) 



(18) 



(19) 



(20) 



The last inequality follows since {S^ } satisfies ([14]). The second last inequality follows since 

p opt < 1. 

The above relationship shows that the new set of transmit covariance matrices {S£ ew } will at 



least not decrease each user's rate over that achieved by BD. With ( |18| ), R^ ew can be interpreted 
as the achievable rate by applying {S^ p *} in an environment with noise power p op ', instead of 
1 as in the original system. Since p opt < 1, this implies an effective SNR enhancement by 
101og 10 (l/p op< ) dB for {S£ e ™} over {S^ }. Furthermore, since {S£ p *} performs at least as well 
as {Sf D } due to {RJ), then with {SJ ew }, an effective SNR enhancement by 101og 10 {l/p opt ) dB 
over BD is also guaranteed. Such SNR enhancement provides a way to compensate the transmit 
power boost problem associated with BD, and hence increase the achievable rate. We are now 
ready to present the algorithms to solve (P3). 

A. Solve (P3) When N r = 1 

When each MS has single antenna, and hence single data stream only, BD reduces to the well- 
known zero-forcing (ZF) precoding JH, flU. Denote the channel vector to MSk as e C lxM , 
then (P3) can be equivalently formulated into the following problem [|23l 

(21) 



(P4): minimize p 

P,{w fc } 



subject to 



£ 

fc=l 



w 



bill 2 



|h fc Wi| 2 



ZF 



< pP, Vj 



(22) 
(23) 



February 23, 2012 



DRAFT 



9 



where {7^ F } is the SINR tuple achieved with the ZF precoding, w fc G C Mxl is the precoding 



vector for MSk and wjf' G C JVtXl corresponds to the precoding vector for MSk used by BSj. 



iNt x 1 



The above problem can be transformed into an equivalent SOCP as follows Q23 

minimize p 

P,{w fe } 



subject to 



[h fc W] T 
1 

vec(M i ; 

MxK r 



-< /CO, Vfc 



^/CO, Vj 



(24) 



where p = p 2 , W = [wi w 2 ...w^ r ] G C AixK % M 



'2 



w 



For any vector y G C , x G 



^ /CO represents the second order cone constraint 



|y||i < x2 - The SOCP is convex and can be solved efficiently with software tools such as CVX 



B. Solve (P3) When N r > 2 

When N r > 2, no convex formulation for (P3) is known. The non-convexity arises from 



the non-convex rate and rank constraints ( |T4| ) and ( fT6| ). In this subsection, we propose an 



efficient algorithm to solve (P3) approximately. Firstly, the rate constraints ( fT4[ ) are convexified 
by applying the following first-order Taylor approximation^ 

-f^r" I^f* K^p 



log 



I + Hfe^S.jH 

i^k 



Tr 



Hj^SijH 



Tr 



HfH fc £S 



(25) 



i^k 



i^k 



K r 



log 



I + H fe J> Hf 



log|l + H,S fc Hf| +Tr 
log|l + H fe S fc Hf| +Tr 



(l + H fc S,Hf) 'HJ^SijHf 



H f (I + H fc Sf D Hf ) H fc ( S, 



K r 



(26) 



2 Note that although \25\ is sufficient to convexify (14) , (26} is necessary to handle the non-convex rank constraints given by 



(16}. 
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where the identity Tr(AB) = Tr(BA) has been used. In ( |26] ), the gradient of the log-determinant 
function at the point I + H fc S fc Hf has been approximated as (I + H fc Sf D Hf With ([25]) 



and (26), (P3) can be approximated as 



(P5): maximize — p (27) 

P,{Sk} 



subject to log 1 1 + HfeSfcH^j > Tr (f* ^ Si) + R k D , VA; (28) 

i^k 

K r 

J^TrfB.-Sfc) <pP, Vj (29) 
fc=i 

S fc b 0, rank{S fc } < JV r , VA; (30) 

where F k = Hf [I - (I + H fc Sf X)''] H fc . 

It can be verified that {p = 1, {S^ D }} is still feasible for (P5), so the solution {p**, {S* k *}} 
to (P5) still satisfies p** < 1. Due to the rank constraint, (P5) is still non-convex. However, 
by solving the rank-relaxed problem (denoted by (R-P5)) with the dual method, we show that 
the optimal solution is guaranteed to satisfy the rank constraint, and hence it is also an optimal 
solution of the non-convex problem (P5). Denote by {A*;} and {pj} the set of dual variables of 



(R-P5), which are associated with the rate ( [28] ) and per-BS power constraints ( |29| ), respectively 



Then the Lagrangian function of (R-P5) can be written as 

■k(p>{Sfc},{Ajfe},{/Zj-}) 

K r K r K t K r 

= -p + J2h [log|I + H fc S fc Hf | - Tr (F* £ S,) - R* D ] + g N [pP - ]T Tr (B.-S*) 



(3D 



k=l iytk j=l k=l 

K t K r 

= P (pY,N - l ) + [ X Ml + HfeS fc Hf | - Tr(C fc S fc ) - \ k R* D 
j=i k=i 

K t K r 

where C k = ^J/ijBj + AjFj. The Lagrangian dual objective is then written as 

3=1 i=l,i^k 

9 {/ij}) = max max L (p, {S fc }, {A fc }, {^}) 

Ofc>;0,Vfc p 

max L ({S k }, {X k }, {p 3 }) , if ^ = 1/P (32) 
oo, otherwise 
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where L ({S k }, {A fc }, {//,}) 4 ^ [A fc log|I + H fc S fe Hf I - Tr(C fe S fc ) - \ k R* D 

Note that since L (p, {S k }, {Afc}, {^j}) is an affine function of p, g ({Afc}, {pj}) is finite only 
when ^2jlifJ>j = 1/P- Since the dual variables should be chosen such that the Lagrangian 
dual function is bounded, this imposes equality constraints on the dual optimization problem of 
(R-P5), which is stated as 

(R-P5-D) : minimize g ({A fc }, {Hj}) 

Afe>0,/ij>O,Vfc,j 

K t (33) 
subject to /~]fJ>j — l/P 

3=1 

Since (R-P5) is convex and satisfies the Slater's condition [|2T|. the duality gap between the 
optimal objective function value of (R-P5) and that of its dual (R-P5-D) is zero. Thus, the 
optimal solution can be obtained by simultaneously updating the primal variables, {Sfc} and the 
dual variables {Afc} and {fij}. For a given set of dual variables {Afc} and {p,j}, {S£} can be 



updated by solving the maximization problem ( 132 ). With {S£}, the dual variables {Afc} and {nj} 



can be updated with subgradient-based method ll24ll . 

1) Primal Update: We firstly focus on solving for {S£} with a given set of dual variables 



{Afc} and {fij}. It can be observed from ( f32| ) that the maximization of L ({Sfc}, {Afc}, {/i?}) over 
{Sfc} can be decoupled into K r parallel sub-problems, each solving for one By discarding 
the irrelevant terms, the subproblem for solving Sfc , given {X k } and {fij}, is 

(P6): maximize A fc log|I + H fc S fc Hf | - Tr(C fc S fc ), (34) 

K t K r 

where C k = ^p,B, + ^ AjFj e C MxM . 

3=1 i=l,i^k 

Lemma 1. For (P6) to have a bounded objective value, the dual variables {\ k } and {pj} should 
have values such that C k is positive definite, i.e., C k y 

Proof: See Appendix |A} ■ 

1 /2 1/2 1/2 
Cfc can be decomposed as Cfc = C k C k , where C k is Hermitian and 

invertible. Furthermore, Tr(C fc S fc ) = Tr(Cfc /2 S fc C^ /2 ). Define S fc = C* /2 S fc C* /2 , then S fe = 

C k 1 ^ 2 SfcC^ 1 ^ 2 . Then (P6) is equivalent to 

maximize A fc log|I + H fc C^ 1/2 S fc C fc 1/2 Hf | - Tr(S fc ) (35) 
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To find the optimal S fc , express the (reduced) SVD of H k C k 1/2 G C N * xM as 

H,C- 1/2 = U fc S fc Vf , (36) 
where U fc G C^ x **,V fc G C MxA ^ and UfU fc = VfV fc = 1^. S fc = Diag(<r M , . . . , a k>Nr ). 



Then (35) is equivalent to 



maximize A fc log|I + Vf S fe V fc S 2 | - Tr(S fe ) (37) 

S*>0 



Applying the Hadamard's inequality II20II . the optimal solution to ( [37] ) and hence to ( [35] ) is 
given as S* k = V fe D fe V^, where D fc = Diag(dfc t i, . . . , dk,N r ) w hh d kjS obtained by standard 
water-filling algorithm [|20l 

4, s =(A fc -^-] , s = l,...,JV P , (38) 

where (x) + = max(0, x). With such results, the optimal solution to (P6) for a given set of dual 
variables {A^} and {fij} is given as 

S* fc = C k 1/2 V fc D fc Vf C" 1/2 (39) 

When the optimal solution for dual variables {\ k } and {fij} is obtained, the corresponding 



solution in p9\ (now denoted by S£*) becomes optimal for (R-P5). 



Remark 1. Since V k G C MxNr , rank{S£*} < N r is automatically satisfied due to ( [39] ). As a 
result, {S£*} is an optimal solution to the rank constrained non-convex problem (P5) as well. 
On the other hand, if (R-P5) is directly solved with software tools such as CVX K22\l . there is 
no guarantee that the rank constraints will be satisfied. 

Remark 2. With {S* k *} obtained, the optimal power factor to (P5) can be calculated as 

p**= max -VTr(B 7 Sr) (40) 
je{l,...,K t } Pf-^ 3 

2) Dual Update: We now focus on solving the dual problem (R-P5-D). The dual variables 
{Afc} and {fij} can be updated with subgradient-based method after finding {S£,}. The equality 
constraint in (R-P5-D) can be eliminated by substituting \i^ t = -p — J^fli 1 so that the problem 
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dimension is reduced by 1. Then the dual function after substitution of fj, Kt is given by 

C K r K r 

9 {{^k}, {^j}fli l ) = max < ^A Jk [log|I + H fc S fc Hf|-'&(F Jk ^SO-i^ D ] 



k=l ij^k 

K t -1 K r * K r 



(41) 



+ E /*Z>K B * - B i) s »] - pE Tr ( B ^ s * 

Then (R-P5-D) is equivalent to 



P 

i=i fc=i fe=i 



(P7): minimize g ({\ k },Mj=i) 

Kt-l 

subject to /ij < l/P 

^•>0, j = l,...,^-l 

Afc > 0, = 1, . . . , if r 
The subgradient of (P7) can be found with the following Lemma. 



(42) 



Lemma 2. With the primal solution {S£} given by <p9|) for a given set of dual variables {Afc} 
and {fij}, the subgradient of g ({A^}, {l^jjfli 1 ) is given by 

K r 

SAfc =log|I + H fc S*Hf|-Tr(F fc J]S*)- J Rf D , k = 1, . . . , K r (43) 

K r 

^.=^Tr[(B^-B,)S*], j = l,...,K t -l (44) 

fc=i 

Proof: See Appendix [B] ■ 
With the subgradient obtained, the dual variables can then be updated with subgradient-based 

method, such as ellipsoid method ll25ll . 

3) Primal-dual Method for (P5): The algorithm for solving (R-P5), and hence the non-convex 

problem (P5) is now summarized in Algorithm [TJ 
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Algorithm 1 Primal-dual Method for (P5) 



l: Initialize \ k > 0, Wk and \ij > 0, j G {1, . . . , K t - 1}, E^i"* N < 1 / P - 
2: repeat 

3: With {A fc } and {/^J-j^!, where px t = l/P - Ylf^x~ l^v solve for i^t) usm S 



4: Compute the subgradient of g ({A^}, {pj}jli X ) using ( |43| ) and ( [44] ), then update {Afe} 



and {Hj}f=i 1 accordingly based on the ellipsoid method [|25l . 



5: until {Afc}^-,^ and 1 converge to a prescribed accuracy. 



6: Then {S£} approaches to the optimal solution {S£*}. Set p** using <[40]) . 



C. Improved precoding over BD 

Based on previous discussions, for the given optimal BD solution (or ZF precoding when 
N r = 1), the following steps can be applied to find an improved linear precoder design. 

Algorithm 2 Improved precoding over BD 
l: Solve (P5) with Algorithm [T] (or (P4) with CVX when N r = l). Denote the solution as 

{{Sr},P**} (or {{w*},p*} when N r = l). 
2: Set the proposed transmit covariance matrices as S^ rop = S^* //?**, (or the proposed precoder 
when N r = 1 as w^ rop = w* k / y/p*) ,Vk. 



V. Numerical results 

This section presents the numerical results. For the simulations below, the entries of the 
channel matrices are independently and identically distributed (i.i.d) circularly symmetric 
complex Gaussian random variables with zero mean and unit variance. Since the noise power 
is normalized, the system SNR is defined as SNR = P, where P is the maximum power for 
each BS. Algorithm [T] is terminated when the volume of the ellipsoid containing the optimal 
dual variables is sufficiently small, or more specifically, when \J s T Es < 10~ 6 , where s is the 
subgradient vector, E is the positive definite matrix whose eigenvectors define the principal 
directions of the ellipsoid. 
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A. Convergence Behavior of Algorithm^ 

The convergence behavior of Algorithm 1 is illustrated with one channel realization at SNR = 
dB. A network with [K t N t K r N r ] = [3 2 3 2] is simulated. The initial values of the dual vari- 
ables are assigned with pj = l/(PK t ),\/j and = 0.1, V/c. Algorithm [TJ generates a sequence 
of transmit co variance matrices set {S£}. The achievable sum rate with the scaled co variance ma- 

,p* = max . (l/P)E£iTr(B,-S£). 

je{i,- ,K t } 

Such scaling ensures that the per-BS power constraints are satisfied. The BD solution is also 
plotted with dotted line for comparison. It is observed that the algorithm eventually converges 
to a fixed sum rate, which significantly outperforms the optimal BD. Similar to that in |fl~5ll. the 
convergence speed depends on the total number of dual variables, K r +K t — 1. With the ellipsoid 
method, it is known that the complexity is of the order 0[(K r + K t — l) 2 ] for large system. It 
is noted that the convergence point does not necessarily give the optimal solution, since higher 
sum rate has been observed in previous iterations. This is due to the approximations that have 
been made for solving (P3). However, the algorithm does converge to a point with a sum rate 
very close to the highest rate that has appeared so far, as shown in Fig. [2} 

B. Sum Rate Comparison 

The sum rate achieved with the proposed scheme is compared with the optimal BD, as well as 
two MMSE-based precoding schemes JED, ED, denoted as "MMSE Zhang" and "MMSE Shi" 
in the figure, respectively. A network with parameters [K t N t K r N r ] = [3 2 3 2] is simulated. The 
average sum rate over 10000 channel realizations is plotted in Fig [3] Firstly, it is observed that 
the two MMSE-based schemes, although provide some rate gain over BD at low SNR, perform 
worse than BD in the high SNR regime. Furthermore, the performance degradation increases 
with SNR. On the other hand, the proposed scheme outperforms the optimal BD across all SNR 
ranges and the gain is more pronounced in the low to medium SNR regime. The average value of 
1/p** in dB, with p** the optimal power factor for (P5), is also plotted in Fig.|4j Since (P5) is an 
approximated problem formulation of (P3), 1 / p** in dB can be viewed as the approximated SNR 
enhancement, as discussed in Section [IVJ Fig. [4] verifies the sum rate gain in Fig. [3] and it also 
shows that solving the non-convex problem (P3) by solving (P5) is a reasonable approximation. 



trices {S£/ p*} is plotted in Fig. 2 where similar to (40 
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VI. Conclusions 

This paper proposes an improved linear precoding scheme over over BD in multi-cell cooper- 
ative downlink networks under per-BS power constraints. The performance gain is achieved by 
applying an effective-SNR-enhancement technique. It is shown that by solving a power minimiza- 
tion problem subject to a minimum rate constraint achieved by BD, and using the properly scaled 
transmit covariance matrices at each transmitter, the system noise can be effectively suppressed 
and the SNR can be enhanced. Such a technique provides a method to compensate the transmit 
power boost problem associated with BD. The power minimization problem is in general non- 
convex, due to the non-convex rate and rank constraints. In order to find an efficient solution, the 
rate constraint is convexified by using Taylor approximation. Then the rank-relaxed convexified 
problem is solved with the dual method. The closed form solution shows that there is always 
an optimal solution for the rank-relaxed problem such that the rank constraint is guaranteed to 
be satisfied. Therefore, the solution is also optimal to the rank-constrained non-convex problem. 
The proposed scheme is efficient since only convex optimization problem is required to be 
solved. Simulation results show a significant sum rate gain over the optimal BD and existing 
MMSE-based schemes. 



Appendix A 
Proof of LemmaQ] 



It can be verified that at the optimal solution to (R-P5), the inequality constraints ( [28] ) will 
be active. Then based on the complementary slackness condition [12711 . we can assume that the 
optimal dual variables {A£} are positive. Therefore, we can assume that A& > in (P6). We then 
prove Lemma [TJ by contradiction. Since is Hermitian, all the eigenvalues are real. Suppose 
that Cfc has a non-positive eigenvalue, i.e., 3a < and a normalized vector q, with q^q = 1 
such that Cfeq = ctq. Then let Sk = tqq^ with t > 0. Substituting into the objective function 
of (P6) yields 

A.log |I + iHfeqq^Hf | - Tr(tC,qq") 
=A fe log(l + t||H fc q|| 2 ) -at. (45) 



Since a < 0, as t — > oo, the value of ( [45] ) becomes unbounded provided that H^q ^ (which 
is true with probability one with independent channel realizations and the fact that does not 
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depend on H fc ). Therefore, we conclude that in order to have a bounded objective value for (P6), 
all eigenvalues of should be positive. As a result, Lemma [T] follows. 



Appendix B 
Proof of Lemma[2] 

A vector s is a subgradient of function g(x) at point x if 

g(5t) >^(x) + s T (x-x),Vx, (46) 
Or equivalently, the vector formed by {s\ k } and {s^} is a subgradient of g ({Afc}, {^^fli 1 ) if 

K r K t -1 

9 ({M, {^j}fli X ) > 9 {{Xk}AHj}fli l ) + ^s Xk {\ k - \ k )+ SwC^i-^O.VAfc,^- 

k=i j=i 

(47) 

For the given {Afc} and {/ij}^ 1 , denote by {S£} be the transmit covariance matrices that 
achieves the maximum dual function value g ({A fc }, {[ijjfl^ 1 ). Then VA fc ,/L,, 



9 ({A,}, felfir 1 ) = s maxJ £ X k [log|I + H fc S fc Hf | - Tr(F fc £ S 4 ) - 

fc=l i^k 

Kt-1 K r if, 



(48) 



+ E ^E Tr K B ^ - B ^S.] - -^Tr(B Xt S, 

j=i fc=i fc=i 

K r x r 
> J> [log|I + HfcS^Hf | - Tr(F fc £ S*) - 

fe=i j^fc 

Kt-1 K r K r 

+ E ftX>[(B* t - B i) s a -pE T «) < 49 > 

j=i fc=i fc=i 

Kr Kt-1 

= ^({AJ,^}^ 1 ) +E s A fc (Afe-A fc )+ E s «fe - /'/)• (5°) 

k=l j=l 

where equality ( |48] ) follows from ( |4T| ), inequality (j49j) follows since g ({Afc}, {fij}^!^ 1 ) is the 
maximum value over all Sfc >z for the given dual variables {Afc} and {p,j}fj^ 1 . s\ k and are 
given by ( |43| ) and ( [44] ), respectively. Equality ( [50] ) is obtained by using A fc = (Afc — A fc ) + A fc , /ij = 
(p,j — n 3 ) + /ij and the fact that {S£} achieves the maximum value g ({Afc}, {/ij}^^ 1 ). Then 
together with ( |47| ), Lemma [2] follows. 
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Fig. 1: System Model 
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Fig. 2: Convergence behavior of Algorithm 1. [K t N t K r N r ] = [3 2 3 2], SNR = 10 dB. Dashed line shows the 

achieved sum rate with BD. 
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Fig. 3: Average sum rate for various schemes, [K t N t K r N r ] = [3 2 3 2]. 
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